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We consider the decomposition of arbitrary isometries into a sequence of single-qubit and 
Controlled-NOT (C-not) gates. In many experimental architectures, the C-not gate is relatively 
‘expensive’ and hence we aim to keep the number of these as low as possible. We derive a theoretical 
lower bound on the number of C-NOT gates required to decompose an arbitrary isometry from m to 
n qubits, and give three explicit gate decompositions that achieve this bound up to a factor of about 
two in the leading order. We also perform some bespoke optimizations for certain cases where m 
and n are small. In addition, we show how to apply our result for Isometries to give a decomposition 
scheme for an arbitrary quantum operation via Stinespring’s theorem, and derive a lower bound on 
the number of C-nots in this case too. These results will have an impact on experimental efforts 
to build a quantum computer, enabling them to go further with the same resources. 


I. INTRODUCTION 


Quantum computers would allow us to speed up several 
important computations including search [l|, , quantum 

simulation Q and factoring Q. The ability to do the 
latter would render RSA (sj , a widespread cryptographic 
protocol, unfit for purpose. However, constructing a de¬ 
vice capable of performing such computations is one of 
the biggest challenges facing the field, and many candi¬ 
date platforms remain in their infancy, operating only 
with a few qubits at best. 

In spite of this, the theory of quantum computation is 
quite advanced. At an abstract level, a quantum compu¬ 
tation corresponds to a unitary operation, and a universal 
quantum computer should be able to perform arbitrary 
unitary operations (each to very high precision). Rather 
than having a different component for each unitary oper¬ 
ation, it is convenient to break down such operations in 
terms of a small family of simple-to-perform gates. This 
is the aim of the circuit model of quantum computation, 
which mirrors an analogous model for classical computa¬ 
tion, in which an arbitrary computation can be decom¬ 
posed in terms of (for example) NOT, AND, OR and C-NOT 
gates. In the quantum case, several examples of univer¬ 
sal gate libraries are known (see for example @ ). In this 
work we focus on one involving arbitrary single-qubit op¬ 
erations and C-NOT gates. This gate set is universal for 
quantum computation in the sense that an arbitrary n- 
qubit unitary can be decomposed in terms of these gates 
alone Q and is particularly well-suited to certain archi¬ 
tectures in which these operations are relatively straight¬ 
forward to implement. Of these operations, C-NOT is 
often the most difficult to perform since in all experimen¬ 
tal architectures it involves connectir^ the qubits using 
an additional degree of freedom This provides 

additional channels for the introduction of decoherence. 
The mediated interaction also typically requires longer 


gate times, increasing susceptibility to direct qubit de¬ 
coherence. As an example, the current lowest infideli¬ 
ties achieved experimentally are < 10“® for single-qubit 
gates 0 and ~ 10 ^ for two qubit gates [ll|. Tak¬ 
ing this as our motivation, we use the number of C-NOT 
gates required in a decomposition as a measure of the 
complexity of a gate sequence and we consider circuits 
that minimize the number of such gates. 

This task has been previously considered both for ar¬ 
bitrary unitary operations and for state preparati on ( see 
for example [IJ, [l^ and references therein). In 121, a 
decomposition scheme was found for an arbitrary uni¬ 
tary operation on n qubits that requires ||4” C-nots to 
leading order, approximately twice as many as the best 
known lower bound [l^ . Hsj] . Similarly, in order to pre¬ 
pare a state of n qubits (starting from the state |0)®"), 
the best known construction requires ^‘2^ C-nots to 
leading order if n is even [I^, and 2” to leading order 
if n is odd [l^ , which is again approximately twice the 
best known lower bound 11,Jl. 


State preparation and arbitrary unitaries are special 
cases of a wider class of operations, isometries. An iso¬ 
metry is an inner-product preserving transformation that 
maps between two Hilbert spaces that in general have dif¬ 
ferent dimensions. Physically, isometries can be thought 
of as the introduction of ancilla qubits in a fixed state 
(conventionally |0)) followed by a general unitary on the 
system and ancilla qubits. However, because its action 
only has to be specified when the ancilla systems start in 
state |0), there is a lot of freedom when constructing the 
general unitary. This freedom can be exploited to lower 
the number of C-NOTs needed with respect to that of a 
general unitary. In the special case where the input and 
output spaces have the same dimensions, the isometry is 
a unitary operation, while state preparation corresponds 
to an isometry from a (trivial) one-dimensional space to 
that of the required output. In this manuscript we con- 
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TABLE I: Lowest known upper bounds and highest known lower bounds on the number of C-not gates required to decompose 
m to n isometries for large n. For simplicity, all the counts are depicted to leading order. As is to be expected, the number of 
required C-NOT gates increases with m (i.e., when fewer of the input qubits start in a fixed state). 


m 

Lower Bound [LB] 

Upper Bound [UB] 

UB/LB 

References for Upper bound 

m = 0 (SP) 

1 ^ m n — 2 

m = n — 1 

m = n (Unitary) 

i2" yj 

1 eyn+m Am — 1 

2 ^ ~ ^ 

16^ 

I 4 " [14, 

23 on 

eyn+Tn 1 ryn 

^ ~ 

23 An 

64^ 

23 An 

48^ 

~ 1.9 

< 2.3“ 

~ 1.9 

~ 1.9 

[1^ (n even), Rnik.[3(n odd) 

Ea. (IA21I). (Theorem [21^ 

Eq. ([A2^ 

[12] 


“If 1 < m s: n — 5 we have UB/LB < 2 (for large enough n). 

^In the case 5 ^ m ^ n — 2 and even n, Theorem [2] achieves a 
slightly lower C-NOT count of + 2"') to leading order. 


sider the problem of synthesis of general isometries from 
m qubits to n ^ m qubits. 

This task was first considered by Knill [a , whose de¬ 
composition scheme is based on a decomposition scheme 
for state preparation (and uses such a scheme as a black 
box). His decomposition scheme together with the state 
preparation scheme of [l^ (or [T^) leads directly (with¬ 
out any optimizations) to an decomposition of m to n 
isometries requiring about 2 ■ 2"*+” C-nots to leading 
order. However, this can be modified (together with 
the decomposition scheme for state preparation described 
in [T^) to achieve -|- 2 ” to leading order, which is 

our first decomposition scheme. 

We also introduce two others. Our second scheme is 
a column-by-column decomposition of an isometry that 
requires about 2'"+" C-not gates to leading order. This 
decomposition also performs well for cases where m and 
n are small. For our final scheme, we adapt the decom¬ 
position of arbitrary unitaries [l^ to isometries, leading 
to a C-NOT count of about 0.16 • (4™ -|- 2 • 4”) to leading 
order. 

To compare the quality of our schemes we give a the¬ 
oretical lower bound on the number of C-not gates re¬ 
quired to decompose arbitrary isometries. These results 
are summarized in Tables U and |lTl As shown in Table HI 
for large enough n, in the worst case our decomposition 
scheme uses roughly 2.3 times the number of C-nots re¬ 
quired by the lower bound (the worst-case being an n — 2 
to n isometry). This is comparable to the factor of 1.9 
already known in the special cases of state preparation 
and of arbitrary unitary operations. 

In addition, we optimize the C-not counts for m to 
n ^ 4 isometries in Appendix |B] (see Table Hill for a sum¬ 
mary). These are most likely to be of practical relevance 
for experiments performed in the near future. 

The C-NOT counts in Table H Table El and Table uni 
can be directly used to upper bound the total number of 
gates needed for the decomposition. Since each C-NOT 
gate can introduce at most two single-qubit gates into a 
quantum circuit without redundancy (cf. Section IHII for 


similar arguments^), the number of single-qubit gates 
required for an isometry can be bounded by doubling 
the counts given in the two tables and adding n, the 
number of qubits in question. 

Although we have ranked the decompositions in terms 
of gate counts above, there may be other features of a 
given decomposition scheme that make it preferable to 
another which may depend on the physical setup. It is 
also interesting to note that our decomposition schemes 
use others in a black box fashion (cf. Section IVl for more 
details), e.g., the decomposition scheme of Knill uses a 
scheme for state preparation as a black box. An im¬ 
provement in the decomposition of the black box would 
therefore directly improve the corresponding decomposi¬ 
tion for an isometry, potentially altering the ordering in 
terms of gate counts. 


II. BACKGROUND INFORMATION AND 
NOTATION 


We work in the circuit model of quantum computation 
in which the fundamental information carriers are qubits. 
A computational basis state of the 2^-dimensional 
Hilbert space Hn = of an n qubit register can be 
written as | 6 „_i) 0 \bn- 2 ) ® • • • (81 |&o) or, in short nota¬ 
tion, as \bn-1bn-2 ■ ■ - bo), where bi € {0,1}. To abbre¬ 
viate further we write |&„_i6„_2 ■ ■ - bo) = / , 


i.e., we interpret the bit string bn-ibn -2 ■ ■ - bo as a bi¬ 
nary number. If n = 1 we omit the subindex. Thus, 
11)3 = | 001 ) = | 0 ) 18 | 0 ) (8 | 1 ), for example. 

In the circuit model of quantum computation, informa¬ 
tion carried in qubit wires is modified by quantum gates, 
which correspond mathematically to unitary operations. 


^ Note that we count arbitrary single-qubit gates here (rather than 
gates that rotate about a fixed axis). 
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TABLE II: Overview of the number of C-not gates required to decompose m to n isometries using different decomposition 
schemes (NB: for small n we have done some additional optimizations—see Table Hill). Abbreviations used: “Column-by-column 
decomposition of an isometry; ^Decomposition of an isometry using the Cosine-Sine Decomposition. 

Method 

C-NOT count for an m to n isometry 

References 

Knill (optimized) 

23(2"*+"+ 2")-bC (n^) 2"* if n is even 

Theorem [5] 


^(2"*+’*-b2")-bC(n2)2™' if n is odd 

Theorem [2] 

CCD“ 

2 ™-+“ - A2" -b O (n^) 2’" 

Eq. (fArr) 

CSD'’ 

^ (4- + 2 • 4“) -b C (m) 

Ea. (|A22l) 


TABLE III: Smallest known achievable C-not counts for m 
to 2 n ^ 4 isometries. The counts for n = m are as in [T^] • 
The counts for state preparation (m = 0) on two and three 
qubits are taken from [^, and the count for state prepa¬ 
ration on four qubits follows from the decomposition scheme 
described in Appendix IA 51 The remaining cases are discussed 
in Appendix [B] Note that the C-NOT counts grow very fast. 
For example, any unitary on 10 qubits can be performed using 
about 500000 C-not gates. 


\ m 
n \ 

0 

1 

2 

3 

4 

2 

1 

2 

3 

- 

- 

3 

3 

9 

14 

20 

- 

4 

8 

22 

54 

73 

100 


In particular, we will use the following single-qubit gates: 


RAO) 

Ry{0) 

RAO) 


f cos[0/2] —isin[0/2] \ _ 

—isin[0/2] cos[0/2] j ’ 

/ cos[0/2] — sin[0/2] \ 
sin[0/2] cos[0/2] J ’ 



which correspond to rotations by angle 6 about the x-, y- 
and z-axes of the Bloch sphere. One important special 
case is the not gate, = \RxA) in terms of which the 
C-NOT gate can be written as |0X0| ® I + |1)(1| 

Lemma 1 (ZYZ decomposition) For every unitary 
operation U acting on a single gubit, there exist real num¬ 
bers a, P, 7 and 6 such that 

U = F^RAP)RyA)RziS). (4) 

A proof of this decomposition can be found in @ . Note 
that (by symmetry) Lemma [T] holds for any two ortho¬ 
gonal rotation axes. Lemma [T] shows that a single-qubit 
gate can be specified by three real parameters neglecting 
the (physically insignificant) global phase e'“. This is 
analogous to the description of a rotation in 3-dimensions 
being parameterized in terms of three Euler angles, here 
P, 7 and S. 


It is convenient to represent quantum circuits diagram- 
matically. Each qubit is represented by a wire and gates 
are shown using a variety of symbols. Conventionally 
time flows from left to right. We will use the concept of 
circuit topologies, as in throughout this paper. A 

general circuit topology corresponds to a set of quantum 
circuits that have a particular structure, but in which 
some gates may be free or have free parameters. For ex¬ 
ample, Lemma [T] can be expressed as an equivalence of 
two circuit topologies. 


— U— = — R^ — Ry — Rz — 


The general meaning of a circuit topology equivalence 
is the following: for all possible values of the (free) pa¬ 
rameters of the circuit topology on the left hand side 
there exist values for the parameters of the circuit topol¬ 
ogy on the right hand side such that the two sides perform 
the same operation (up to a global phase). For example, 
each of the R^ gates in the above circuit represents a z- 
rotation gate with unspecified angle. If we use symbols 
for certain gates that have not been introduced before, 
they are considered to be arbitrary quantum gates (these 
will often be denoted by U). If the same symbol is used as 
a placeholder for more than one quantum gate, we mean 
that all gates are of this form, but the gates themselves 
don’t have to be identical (as in the previous example 
where although appears twice on the right hand side, 
each instance can have a different rotation angle). 


III. LOWER BOUND 

First we derive a theoretical lower bound on the num¬ 
ber of C-NOT gates required to decompose an isometry. 
For this purpose we use a similar argument as that used 
to derive theoretical lower bounds for general quantum 
gates Bin or for state preparation [l^ . Let m and n 
be natural numbers with n 2 and m ^ n. An m to n 
isometry can be represented by a 2" x 2™ complex matrix 
satisfying V^V = / 2 ™x 2 ™- Therefore such an isometry is 
described by 2"+'"+^ —2^™ —1 real parameters, where the 
— 1 accounts for the physically negligible global phase. 

We can think of this isometry in terms of a unitary op¬ 
eration on n qubits, n —to of which always start in a hxed 
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state, which we take to be |0)^. Without any C-NOTs, 
all we can do is apply single-qubit unitaries individually 
to each of these n qubits. Each such unitary introduces 
at most 3 parameters (cf. Lemma [1}. However, for the 
qubits that start in state |0), only two parameters are in¬ 
troduced, since a qubit state is fully specified by two real 
parameters. In order to introduce further parameters, 
C-NOT gates are required. 

One might expect each C-NOT gate to allow the in¬ 
troduction of six real parameters by placing arbitrary 
single-qubit rotations after the control and target. How¬ 
ever, since Rz gates commute with control qubits, and 
Rx gates with target qubits, we can introduce at most 
four parameters for each additional C-NOT gate 0[i3. 
In essence we are using the following circuit identity 



which implies 



We conclude, that we can introduce at most 3m-|-2(n — 
to) -|- 4r real parameters using r C-NOT gates. 

In order to be a valid circuit topology, i.e., one that can 
generate every m to n isometry by an appropriate choice 
of its parameters, the number of parameters introduced 
into the circuit by the single-qubit rotations must exceed 
the number of parameters required to specify an arbitrary 
TO to n isometry. Thus, the number of C-NOTs required 
for such a circuit topology, Nisoim, n), must satisfy 3 to-|- 
2(n — to) -I- 4iViso(TO, n) ^ 2"+”^+^ — 2^”^ — I. From this 
we obtain the following lower bound 

Niso{m, n) ^ i (^ 2 ^+m+i _ 22^ _ 2„ _ ^ _ i) . (6) 

We remark that we can rephrase our result (by similar 
arguments as used in [13,113) as follows: almost every 
m to n isometry cannot be decomposed into a quantum 
circuit (comprising single-qubit unitaries and C-NOTs) 
with fewer than \j (^2"+™+^ — 2^™ — 2n — to — l)] C- 
NOT gates. It is worth saying that the set of measure 
zero that is excluded from this statement contains sev¬ 
eral interesting isometries, for example that required for 
Shor’s algorithm [3|. This lower bound provides a limita¬ 
tion on a universal quantum computer, rather than one 
tailored to a specific task. 


2 Note that additional ancilla qubits will not affect the lower 
bound. This can be seen by using the same arguments that we 
use in the derivation of the lower bound for quantum channels 
(see Section El. 


IV. DECOMPOSITION SCHEMES FOR 
ISOMETRIES 

Any isometry, V, from to qubits to n qubits can be 
described by a 2" x 2™ matrix. This can instead be 
represented by a 2” x 2" unitary matrix, [/, by writ¬ 
ing V = 1/12" X 2™, where /2"x2'" denotes the first 2™ 
columns of the 2" x 2" identity matrix. Note that U is 
not unique (unless m = n). Our aim is to find a decom¬ 
position of a quantum gate of the form U in terms of 
C-NOTS and single-qubit gates. We describe three con¬ 
structive decomposition schemes for arbitrary isometries. 
This section focuses on the ideas behind these decompo¬ 
sition schemes; the full technical details can be found in 
Appendix [A] It is also worth noting that the proof of 
each of these schemes can be seen as an alternative way 
to prove the universality of the gate library containing 
single-qubit and C-not gates 0- 


A. Notation for controlled gates 

We use l-qubit-Cjf(17) to denote a gate that performs 
a different /-qubit unitary for each possible state of k 
control qubits, where /7 is a placeholder for a size 2^ set of 
2kdimensional unitary operations. We call an operation 
of this type a uniformly controlled gate (UCG). These are 
also referred to as “multiplexed gates” by some authors, 
e.g. [l2|. If / = I we abbreviate the notation to C^{U). 
If we write Rx-, Ry or Rz instead of U , we mean that all 
the 2^ single-qubit gates that determine the UCG are of 
the form of the corresponding rotation gate. 

In order to write such gates out more precisely, we split 
the Hilbert space of n qubits into a 2^-dimensional space 
corresponding to the control-qubits, a 2kdimensional 
space corresponding to the target-qubits and a 2-^- 
dimensional space, where / := {n — I — k), corresponds 
to the free qubits, i.e., the qubits we neither control nor 
act on: Rn = Rk ^Ri ^Rf- If E is an /-qubit-C'fc(/7) 
gate, then it acts according to 

F (|n)^ 01/2),® 1*3)/) =|*i)fc®(Ui, 1/2);) G 1*3)/, ( 7 ) 

where ii G {0,..., 2*^ - 1 }, 12 G {0,..., 2* - 1 }, 13 G 
{0,..., 2^ — 1} and Ui^ denotes the quantum gate act¬ 
ing on the target qubits if the control qubits are in the 
state If each member of the set Ui-^ apart from one 
(call this one Uj) are equal to the identity operation, we 
drop the word “uniformly” and call such an operation a 
fc-controlled /-qubit gate, denoted by /-qubit-C'fc(/7j), or 
more generally a multi-controlled gate (MCG). If / = I 
and we want to emphasize the total number n of qubits 
of the system being considered, we add an n as a second 
subindex, i.e. Ck{U) becomes Ck,n{U). 

By way of example, the following circuit diagram shows 
a 2 -qubit-C 2 (U), Cs^U) (or C^^a{U)) and C 2 {U) (or 
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(^2,4(^)) g^'te in this order (from left to right). 



Note that the Ck{U) notation does not specify which are 
the control- and which are the target-qubits and whether 
we control on |1) (filled circle) or on |0) (unfilled circle); 
these must be made clear in the particular context. 

Each uniformly fc-controlled gate can be decomposed 
into a sequence of 2^ A:-controlled gates, as should be clear 
from the following example for the case k = 2, I = n — 2 
and n ^ 3. 



The symbol “\” stands for a data bus of several (in this 
case 1) qubits. Note that the UCG above has block struc¬ 
ture C/o © C/i © 172 © U 3 . 

Remark 1 In Table \IV\ of Appendix \A we give an 
overview of C-NOT counts for some special controlled 
gates that are used for decompositions arising in this pa¬ 
per. 

B. Decomposition of isometries using the 
decomposition scheme of Knill 

In this section we combine the decomposition scheme 
for isometries of Knill M and the state preparation 
scheme described in [T^. The main result is as follows. 

Theorem 2 Let m and n be natural numbers with n ^ 5 
and m ^ n and V be an m to n isometry. There exists 
a decomposition of V in terms of single-qubit gates and 
C-NOTs such that the number of C-not gates required 
satisfies^ 

iViso(m,n) < {2^ + l){Nui[n/2\)+Nui\n/2])) 

+2™+ifVsp([n/2j) + 0(n2)2™, (8) 

where Nu(n) denotes the number of C-NOT gates re¬ 
quired for an arbitrary unitary on n qubits. Using the 
best known C-not counts for unitaries and state prepa¬ 
ration (cf. Table\^ this leads to 


Remark 2 For large n, the last two terms in ([8l) are 
negligible. The leading order for this scheme is therefore 
derived from that of a unitary on n/2 qubits. 

Consider a set of unitary operations {I7}i=o~^ 
that Vi\Q) = V |i), i.e., 17 is a unitary for state prepara¬ 
tion on the state corresponding to the ith column of V. 
In the proof of Theorem 3.1 of it is shown that 

U = y2--lC'„_i(P(02—0)''72 L_i .. .KoC'„_i(P(0o))'l7o^ , 

(9) 

where the gate P{9) := e‘®|0)(0| + |l)(l|. Consider decom¬ 
posing each Vi using the (reverse of the) decomposition 
scheme for state preparation described in [l^ . This leads 
to a circuit containing 2™ — 1 instances of the following 
circuit diagram (shown in the case, where n is even), each 
corresponding to a unitary of the form V}j^^i. 



We can merge the unitaries and define Ui := U 3 U 1 and 

U 2 := UiU2. 



We decompose all the terms of the form I7+il7 in equa¬ 
tion ([9|) in this way. The gate ¥ 2^-1 and Vq can also be 
decomposed using the (reversed) decomposition scheme 
for state preparation described in [I^. The Cn-i{P{9i)) 
gates are special cases of Cn-i{U) gates. Hence, each 
can be decomposed into 16n^ — 60n + 42 C-NOT gates 
(see Lemma [T^ . This leads to the claimed C-NOT count 
given in equation ([8]). 

C. Column-by-column decomposition 


Niso{m,n) < ^(2"*“''"' + 2") + O (n^) 2 ^ if n is even, 
11S 

Wso(TO,n) < -^(2™+” + 2") + O (n^) 2™ if n is odd. 


® The exact count for this decomposition can be obtained by re¬ 
placing 0{n?) by 16n^ — 60n -|- 42 


In this section we introduce a circuit topology corre¬ 
sponding to a column-by-column decomposition of an ar¬ 
bitrary isometry, i.e., we decompose any isometry into 
single-qubit and C-not gates proceeding one column at 
a time. 

Theorem 3 Let m and n be natural numbers with n ^ 2 
and m ^ n and V be an m to n isometry. There exists 
a decomposition of V in terms of single-qubit gates and 
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FIG. 1: Implementing the first column of an isometry V from m ^ 0 qubits to n = 4 qubits. The action of Go on ;= V |0)^ 
can be decomposed into operators {Go}igo,i, 2 . 3 , where Gq := . The upper part shows how these gates successively 

zero the entries of the column, while the lower part gives the circuit representation. The inverse of this decomposition scheme 
was introduced in [l^ for state preparation together with an efficient decomposition of the uniformly controlled gates Gq into 
C-NOTs and single-qubit gates. The symbol denotes an arbitrary complex number. 


C-NOTs such that the number of C-not gates required 
satisfies 

NUm, n) < J + O (n^) 2™, 

where Nac'^ ^ ^ denotes the number of C-not gates re¬ 
quired to deeompose a Cif_i_g{U) gate up to a diago¬ 
nal gate A, i.e., to decompose the two gates together, 
where the gate is determined but we are free 

to choose the diagonal gate A. Together with the best 
known decomposition scheme for UCGs (up to diagonal 
gates) m this leads to 

Ai,o(m,n) ^ 2™+" + 0(n2) 2”^. 

We defer a rigorous proof of the theorem to Ap¬ 
pendix lA 3[ and instead use this section to explain the 
main ideas behind the argument. Our proof is con¬ 
structive, and the exact C-NOT count is given in equa¬ 
tion (IA21I) . 

As before, we represent the m to n isometry V by 
a 2" X 2" unitary matrix, here , by writing V = 
G^/ 2 "x 2 ™- Since a C-NOT gate is inverse to itself and 
the inverse of a single-qubit unitary is another single¬ 
qubit unitary, searching for a decomposition scheme for 
Gt is equivalent to searching for a decomposition of a 
unitary operation G satisfying GV = 12^x2^. 

In essence, the idea is to find a sequence of unitary 
operations that when applied to V successively bring it 
closer to 12 ^x2^- We will do this in a column by column 
fashion, first choosing a sequence of quantum gates, cor¬ 
responding to a unitary Go that gets the first column 
right, i.e., Goy|0)„ = / 2 "x 2 '" |0)^ = |0)„, we then 
use Gi to get the second column right without affect¬ 
ing the first, i.e., GiGoF|l)„ = / 2 "x 2 "* |1)™ = |1)„ and 
GiGqV |0)^ = Gi |0)„ = 10),.,^, and so on (up to the 2'"th 


column). In other words, Gfc gets the {k -\- l)th column 
right and acts trivially on the first k columns of / 2 "x 2 '"- 

The gate Gq can be decomposed into single-qubit and 
C-NOT gates by reversing a decomposition scheme for the 
preparation of a state (applied to C |0)^). It is natural 
to imagine repeating this construction for each column in 
turn. However, without further modification, this pro¬ 
cedure doesn’t work since the action required for the 
decomposition of later columns affects those that have 
already been done. In other words, if we construct a 
unitary Gi again by reversing a decomposition scheme 
for state preparation, we can obtain GiGqH |I)^ = | 1 )^, 
but, in general, GiGqV |0)^ ^ |0)^. We therefore intro¬ 
duce a modified technique that takes this into account 
while only slightly increasing the number of C-NOT gates 
needed over that required for state preparation on each 
column. This technique develops an idea used for state 
preparation using uniformly controlled gates [T^ . 

Lemma 4 Let G TLi and define r such that 

{'tp'l'tf') = r^. There exist Uo,Ui £ SU{2), such that 

Uolfj’) = r|0), (10) 

Uim = r|l). (11) 

Proof. Define |'0) = \\if') and \(j)) = — ('!/'|l)|0) -I- 
(■010) |1) G TLi- Then C/q = |0)(^/’| -|- |1)(()'| is unitary with 
det C/q = 1 and obeys equation m- Ui can be obtained 
analogously. ■ 

As noted above, the unitary operation Go can be de¬ 
composed using the reverse of the decomposition scheme 
for state preparation as described in [1^. First we 
act with a UCG Gq = G))_i(t/o_o) on the least signif¬ 
icant qubit. The gate Gq has a 2 x 2 block diagonal 

structure. Using Lemma 0] we can construct Gq such 
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FIG. 2: Implementing the second column of an isometry V from m ^ 1 qubits to n = 4 qubits. The operation of Gi on 
I'i/'i) ;= GoV\l)^ can be decomposed into operators {Gi}igo,i,2.3, where G? = G 3 ([/i'^o)i Gl = G 3 (C/i,i)G 2 Gi = 
G3 (C/i,2)Gi ( 1 /^ 2 ) and G? = C 3 {Ui, 3 ). Note that all these gates act trivially on |0)^. The symbol denotes an arbitrary 
complex number. 


that it zeroes every second entry of := V\Q)^ 

(see Fig. [T]). This corresponds to disentangling (i.e., 
rotating to product form) the least significant qubit, 
so we can write Gq |^q) = jl/’o) <81 |0) for some state 
V’o) S 'Hn-i- Now we apply the same procedure to 
^q) leaving the least significant qubit invariant. We act 
with Gq := G"_ 2 (t/o 4 ), which corresponds to condition¬ 
ally rotating the second least significant qubit, leading 
to GJGq |t/'o) = 1^0) ® jo) ® |0), for some \'^q) G 'Hn-2- 
We continue in this fashion until all the qubits have been 
disentangled. Thus we have constructed a quantum gate 
Go := ... GO such that Go jV'g) = |0)„4. 

In the following we describe how to construct a unitary 
Gi setting the second column of GqI^ to ( 0 , 1 , 0 ,..., 0 ) 
without affecting the first column. We construct G^ = 
G^_i(Uiq) choosing the unitary operations such that 
the first entry of each pair becomes zero (see Fig. [2]). 
In other words, defining j^/jO^ := Gol^jl)^ we have 
Gi jV’i) = \ipi) <8 |1), for some state jV^i). Note that, 
by construction, the first column of GqV in matrix form 
is ( 1 , 0 ,..., 0 ), and, since Go is unitary, the first row also 
has the form (1,0,... ,0). Hence the first entry of 
is already 0 and we can set the upper most 2 x 2 block 
of the uniformly controlled gate G^, i.e. the block acting 
on the states |0)^ and |1)^, to the identity. Therefore we 
can perform this step without affecting the first column, 
i.e. G^GqV |0)^ = G° |0)„ = |0)^. The next step would 
be to do the same to \'>p\) (i-e., zero every second en¬ 
try). Doing so using a G“_ 2 (t/) gate would, in general, 
have a non-trivial effect on the basis state lO)^^. There- 


^ Note that Gq is a circuit for preparing the state |V'o)i ^^lis 
sense we have performed the inverse of state preparation. 


fore we modify the procedure and instead use a G“_ 2 (t/) 
gate to zero every second entry except that in the up¬ 
per most double block of |'0j) or equivalently that in 
the upper most block of four elements of G° |'0°)- We 
subsequently correct for this using an additional MCG 
acting on the second least significant qubit, i.e., we set 
G\ = G„_i(C/iq)G“_ 2 (G“i). With this additional MCG 
we can directly address the quantum states correspond¬ 
ing to the two non zero entries in the upper-most four- 
element block. Indeed, controlling on |0) ® |0) ® ® |0) 

on the first (n — 2) qubits and on |1) on the least signifi¬ 
cant qubit we can zero the second non zero entry of the 
upper-most four-element block without affecting |0)^. 

We conclude that GjG? |'0°) = j'^J) <8 |0) <8 |1) and 
|0)„ = |0)„- We continue in this way, until the 
most significant qubit is disentangled. We have there¬ 
fore constructed a operation Gi such that GiGqV |1 )^ = 
Gi K) = |1)„ and GiGoV |0),„ = G, |0)„ = |0)„. 

This procedure can be continued in a similar fashion, 
leading to unitaries Gfc such that GkGk-i ■ ■ ■ GqV \k)^ = 
\k)^ and Gfc |i) = |f) for all I G {0,1,..., fc—1}. For a gen¬ 
eral description of the construction of the unitary Gfc see 
Appendix lA 31 We can hence construct a unitary opera¬ 
tor G := G 2 ™-iG 2 m -2 ... Go satisfying GV = / 2 "x 2 "*- 

In order to compute the number of G-NOTs used for 
such a decomposition, we use the following existing re¬ 
sults: 

(i) Vacj = 2^’ — 1 G-NOTs are sufficient to decompose 
a UGG with k controls, up to a diagonal gate [l6l| . 

(ii) NA{m) = 2™ — 2 G-NOTs are sufficient to de¬ 
compose a diagonal gate acting non trivially on m 
qubits [T^ . 

(iii) = G (n) G-NOTs are sufficient to decom¬ 
pose an (n—l)-controlled special unitary gate W 0, 
















































Corollary 7.10]. 


To take advantage of (i) we require a small modifica¬ 
tion to our decomposition scheme. Note that instead 
of implementing the UCGs, we do so up to diagonal 
gates, i.e., for every k, instead of C^{U) we implement 
A/c+iC'^([/), for some diagonal gate A^+i on fe-|-l qubits. 
The effect of these diagonal gates is then be corrected for 
at the end of the entire circuit by adding a diagonal gate 
that acts non-trivially on m qubits and whose C-NOT 
count is given in (ii) (In fact, the number of C-NOTs re¬ 
quired for this is of sufficiently low-order that it doesn’t 
feature in the count of Theorem [3|) 

Furthermore, as shown in Lemma [d] we only require 
MCGs Cn-i{W) for W € SU{2), and hence can use |(iii)| 
In fact, we have modified the decomposition described 
in 0 and used some technical tricks (see Appendix lA II) 
to obtain a C-NOT count for a Cn-i{W) gate with leading 
order 28n. 

We conclude that we can decompose 
each column of an isometry using at most 

^col = + Afc„_i(w)) = 

Er=o ((2”-^""-1)+0(n)) = 2" + C)(n2) C- 


NOTs. Note that (for simplicity) we have overcounted 
the number of additional MCGs, since in the above we 
have assumed each requires an additional MCG. 
Therefore, to decompose an m to n isometry, we require 
at most 2™Acoi -f NA{m) = 2™ (2" -h O (n^)) -h 2™ = 
2™+” -t O (n^) 2”" C-NOTs. 

Note that we implement every column of the isome¬ 
try in a similar fashion. However, there are a lot of 
constraints on the last few columns due to orthogonal¬ 
ity, or, in other words, the first k entries of \'(IJk') 
Gk-iGk -2 ■ ■ ■ GqV jfc)^ are already zero by construction 
and so we have only to act on the other 2” — fc entries. 
Therefore one might expect that the C-NOT count for 
Gfc decreases when k increases. Since we use 2" C- 
NOTs to leading order for each column, our decompo¬ 
sition scheme doesn’t take an advantage of this fact (for 
large n). Hence the column-by-column decomposition 
has some inefficiency in the case where m 'al n (by com¬ 
parison to the case m <^n). To give an improved count 
in the cases m = n — 1 and m = n, we introduce a fur¬ 
ther decomposition scheme based on the CSD, which is 
adjusted to the unitary structure, in Section HV PI Note 
that this scheme corresponds exactly to the decomposi¬ 
tion scheme of [l3 in the case m = n. 


Remark 3 In some physical realizations it is dijficult 
to implement G-NOT gates between non-adjacent qubits. 
The decomposition in this section can be adapted to the 
gate library containing only nearest neighbour G-not and 
single-qubit gates in a relatively efficient way. To do 
so, note that the UCGs used to implement one column 
of an m to n isometry can be performed with at most 
(5/3)2" -I- O (nf) nearest neighbour G-NOT gates Uh 
Furthermore, since a G-not gate acting between qubits a 
distance n apart can be decomposed using O (n) nearest 


neighbour G-NOT gates fl^ J. the MCGs used to imple¬ 
ment one column use O nearest neighbour G-NOT 
gates. Therefore the decomposition of an m to n isome¬ 
try uses at most (5/3)2’"“''"-|-G (n^) 2™ nearest neighbour 
G-NOT gates. 


D. Decomposition of isometries using the 
Cosine-Sine Decomposition 

The most efficient known decomposition scheme for ar¬ 
bitrary unitary operators in term of the number of G-NOT 
gates required uses the GSD [l^ • In this section we adapt 
the decomposition scheme used in [l^ to m to n isome¬ 
tries. To simplify the exposition, the count given here is 
not the lowest we can obtain; an improvement is given in 
Appendix [531 

Theorem 5 Let m and n be natural numbers with 2 ^ 
m ^ n and V be an isometry from m qubits to n qubits. 
There exists a decomposition ofV in terms of single-qubit 
gates and G-NOTs such that the number of G-NOT gates 
required satisfies 

fViso(m, n) ^ 3 • 22"-3 - 2” -f 2’"-^ (3 • 2™ - 8). (12) 

The Gosine-Sine Decomposition (GSD) was first 
used by in the context of quantum computation. 
In particular, the GSD states that every unitary matrix 
g (j ^2 x 2 decomposed in terms of unitaries 

Aq, Ai, Bq, Bi G and real diagonal matrices 

G and S satisfying G^ -I- 5^ = /: 


U = 


Ao 

0 ^ 

fc 


[Bo 

0 

0 

Aj 

U 

G J 

1 0 

Bi 


(13) 


The GSD can be summarized by the gate identity 



Together with 



(which is Theorem 12 of El) it allows a recursive de¬ 
composition of an arbitrary unitary operation in terms 
of single-qubit gates and uniformly controlled Ry and Rz 
gates. 

In the case of an isometry, we again use a repre¬ 
sentation in terms of a unitary matrix, Vn, such that 
V = 1 / 1 / 2 "x 2 "»- Now, if n > m, we can take the control 
qubit of the first (n — l)-qubit-G/(C/n_i) gate to be in 
the state |0), and hence this gate need not be uniformly 
controlled. Thus, the following circuit identity holds 


| 0 ) 
n — 1 
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Note that Vn-i represents an m to n — 1 isometry. 
In the matrix representation the circuit identity above 
corresponds to setting Bi = Bg in equation (fT^ . We 
can decompose the (n — l)-qubit-C'“(C/) gate as above so 
that 



We can use this idea to recursively decompose 14- The 
uniformly (n— l)-controlled rotations can be decomposed 
using at most 2”“^ C-NOT gates [H, The two C/„_i 
gates can be decomposed by using the CSD and the cir¬ 
cuit equivalence 0 recursively until two-qubit gates 
remain^ (each of which can be implemented with 3 C- 
NOTs). In this way it can be shown that each Bn-i re¬ 
quires at most (9/16)4"'“^ — (3/2)2”“^ C-NOT gates l3. 
Note that this is not the optimal count reached in 1^ . 
but we use this slightly weaker count here for simplic¬ 
ity (a count that takes into account the additional opti¬ 
mizations of the Appendix of [l^ can be found in Ap¬ 
pendix |X4|. The C-NOT count for an m to n isometry, 
Niso{m,n), hence satisfies the recursion relations 


9 

Niso{m, I -I- 1) = iViso(m, i) -I- -4* - 2*, ii m ^ i < n , 

(15) 

iViso(m,m) = ^4™-^2™. (16) 


Solving these leads to the claimed count. 


Remark 4 (CSD approach zeroes too many entries) 

Recall that constructing a gate 14 such that 
V = 14^2" x 2 ™ is equivalent to constructing a gate 
such that V^V = / 2 "x 2 '"- Therefore, rewriting equa¬ 
tion m, the first recursion step of the CSD approach 
leads to 


c 


(A 

0 

-s 

c) 

V 0 



U = 


Bg 

0 

0 



(17) 


If m < n—1 we apply the same procedure to Bg. How¬ 
ever, in this case, we already zeroed more entries than 
necessary in the first recursion step. Specifically, it was 
unnecessary to zero at least half of the entries in the up¬ 
per right and in the lower left 2”“^ X -dimensional 
block of the matrix on the rhs of equation (HZl), and the 
number of unnecessary zeros grows as m decreases. This 
intuitively explains why the CSD approach is not well- 
suited to m to n isometries, where m < n — 1 : by zero¬ 
ing too many entries, more C-NOT gates are used than 
needed. 


® We could finish the recursion at any stage, such that only n- 
qubit unitaries reamain. Therefore, an improvement of the C- 
NOT count for n-qubit unitaries could help to improve the C-NOT 
count given in equation II12I I (and equation IIA22ll i. 


Remark 5 (Optimized state preparation) As a by¬ 
product of the above we obtain an improved bound over 
that of U n l on the number of C-NOT gates required for 
state preparation on an odd number n = 2fc -|- 1 ^ 5 
of qubits. The optimized decomposition is based on 0y 
and described in Section El The count \A3CA) using 
state preparation on k qubits, which requires 2 ^ — k — 1 
C-NOTs (as in gives the following count for state 

preparation starting from the basis state 

(n) 4 §2" - ^2^ + 4/3. (18) 

Previously, the bound of ||2"' C-nots to leading or¬ 
der was only known to be achievable for an even number 
of qubUs m ^^th a sUghtly weaker Lund of 2^ C-NOTs 
to leading order in the odd case mi- It is interesting 
to note the parallelizability of our circuit for state prepa¬ 
ration, similarly to mi-' The form of the circuit means 
that, for large (odd) n, the circuit depth (i.e., the num¬ 
ber of computational steps needed to perform the circuit) 
is about 3/4 of the total gate count. Measuring the cir¬ 
cuit depth only in terms of C-NOTs, our decomposition 
scheme has depth ||2" to leading order, improving the 
previous best known bound of ^2^ mi- In the case of 
even n, the minimum known circuit depth is ||2” Ul- 


V. COMPARISON OF DECOMPOSITIONS 


We introduced three constructive decomposition 
schemes for arbitrary isometries from m to n qubits and 
derived a lower bound on the number of C-NOT gates 
required for such decompositions. The asymptotic re¬ 
sults are summarized in Tables U and m To compare 
the three decomposition schemes, we consider the ra¬ 
tios CK{m,n), ccc'(m,n) and ccsD{'m,n) of the C-NOT 
count for the optimized decomposition scheme of Knill, 
the column-by-column approach or the CSD approach, 
respectively, to that of the lower bound for an m to 
n isometry. First note that for m ^ 5 and for large 
enough n the optimized decomposition scheme of Knill 
performs similarly to the column-by-column decomposi¬ 
tion (i.e., CK{m,n) ~ ccc(^;'' t-))- For m 4 4 we have 
ccc(w, n) ~ 2 and CK{rn, n) varies between Cic(4, n) ~ 2 
(if n is even) and 0^(0, n) ~ 4.8 (if n is odd). Hence the 
column-by-column decomposition requires fewer C-NOT 
gates if TO 4 4 (and n is large). In the case m n, the 
CSD approach may outperform the other two decompo¬ 
sitions. For any natural number d and for sufficiently 
large n, we have ccc{n — d,n) = 2‘^+^/(2^+^ — 1) (and 
cccin — d,n) ~ Ciy(n — d,n)) and ccsoin — d,d) = 

36*~(2‘^+^y*~4 • particular ccc(ji — 2 , n ) ~ 2.3 and 
ccc{n — I, n) ~ 2.7 for large n. For to = n — 1 we can use 
the CSD approach to again reach ccsoin — l,n) ~ 1.9 
for large n. 

The column-by-column decomposition and the CSD- 
approach also perform well for small to and n. We give a 
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step by step description of how to decompose to to n ^ 4 
isometries in Appendix |5] The results are summarized in 
Table mu 

In addition we could use the CSD-approach (and a 
technical trick) to lower the C-NOT count for state prepa¬ 
ration. In particular we could lower the lowest known C- 
NOT count for state preparation on 4 qubits from 9 [l^ to 
8 C-NOTs and on 5 qubits from 26 [l3,[i3 C-NOTs 
(cf. Appendix IA5I) . 

The column-by-column decomposition performs simi¬ 
larly to the optimized decomposition of Knill with re¬ 
spect to the C-NOT count, but there are other differ¬ 
ences that should be noted. For example, the column-by¬ 
column decomposition adapts quite well to implementa¬ 
tions where we only allow nearest neighbour C-NOT gates 
(cf. Remark |31) . The optimized decomposition scheme of 
Knill has the advantage that some of the gates can be 
performed in parallel (cf. the circuit diagrams in Sec¬ 
tion |IVB|). 

Another important difference between the column-by¬ 
column decomposition and the optimized decomposition 
of Knill is their dependence on the efficiency of the de¬ 
composition of their building blocks. In the first case, any 
improvement of the leading order of the C-NOT count of 
uniformly controlled gates (up to diagonal gates) leads 
to an improvement of the leading order of the C-not 
count for isometries (cf. Theorem [3]). Where in the sec¬ 
ond case, the leading order of the C-not count depends 
on the leading order of the C-NOT count for arbitrary 
unitary gates (cf. Theorem [2|). 

Remark 6 Another interesting black box relation can be 
extracted from JESl l. where the Sinkhorn normal form for 
unitary matrices is used to decompose a unitary into a 
sequence of diagonal gates and discrete Fourier trans¬ 
forms (cf. Corollary 1 of ^2^1). Since we can perform 
the discrete Fourier transform with a polynomial number 
of gates, they do not contribute to the leading order of the 
C-NOT count of this decomposition. Therefore, this de¬ 
composition allows us to relate the efficiency with which 
we can decompose a unitary with the decomposition of 
diagonal gates. 

VI. APPLICATION TO QUANTUM 
OPERATIONS 

Experimental groups strive to demonstrate their abil¬ 
ity to control a small number of qubits, and the ultimate 
demonstration would be the ability to do any quantum 
operation on them (i.e., any completely positive trace¬ 
preserving (CPTP) map). Since any such operation can 
be implemented via an isometry followed by partial trace 
(using Stinespring’s theorem), we can use our decompo¬ 
sition scheme for isometries to efficiently synthesize arbi¬ 
trary CPTP maps. 

Indeed, we can use a similar parameter counting ar¬ 
gument as used to derive the lower bound for isome¬ 
tries to find a lower bound on the number of C-NOT 


gates required to implement arbitrary CPTP maps via 
a fixed quantum circuit topology. First we use the Choi- 
Jamiolkowski isomorphism |24l - l26l | to simplify the param¬ 
eter count. This isomorphism states that the set of all 
CPTP maps from a system A consisting of to qubits to 
a system B consisting of n qubits is isomorphic to the 
set of all density operators pab on Ha ®Hb satisfying 
t^BipAB) = -^ I A- Since a density operator pab is Her- 
mitian, it can be described by real parameters. 

The condition ti b{pab) = ^Ia corresponds to 2^"* con¬ 
straints, and hence the determination of a CPTP map 
requires — 2^™ real parameters. 

We restrict our analysis of the lower bound to the 
following setting: For the implementation of a CPTP 
map £ from an m-qubit system A to an n-qubit sys¬ 
tem B we allow the use of an arbitrary number k of 
qubits on which we can perform C-NOT and single-qubit 
gates, before we trace out a system C consisting oik — n 
qubits. (Since tracing out qubits commutes with quan¬ 
tum gates on the other qubits, without loss of general¬ 
ity, we can defer tracing out to the end of the circuit.) 
We then use a similar argument as used to derive the 
lower bound for isometries, but instead of commuting 
the Rx and gates to the left of each C-NOT, we com¬ 
mute them to the right so that we perform arbitrary 
single-qubit unitaries on all of the qubits at the end of 
the circuit (reversing the order of circuit diagram ([5])). 
Since we have unitary freedom on the system C (be¬ 
cause trc((/B 0 Uc)pbc{Ib ® U},)) = trc(/OBc)), the 
single-qubit gates on each qubit of the system C at the 
end of the circuit cannot introduce additional parame¬ 
ters. Hence, using r C-NOTs, we can introduce at most 
4r -|- 3n real parameters. By the parameter count for 
a CPTP map given above, we conclude that a circuit 
topology has to consist of at least |'|4™(4" — I) — |n] 
C-NOTS in order that it can implement arbitrary CPTP 
maps from m to n qubits®. 

By Stinespring’s theorem, every CPTP map £ from an 
TO-qubit system A to an n-qubit system B can be im¬ 
plemented with an isometry V from system A to system 
BC, where the system C consists of (at most) n -\- m 
qubits, followed by partial trace on C. We can use the 
column-by-column approach^ to decompose the isome¬ 
try V, which requires 4"^+” — C-NOTs to lead¬ 

ing order (without exploiting the unitary freedom on C). 
Therefore we have found a way to implement an arbitrary 
quantum channel from to to n qubits in a constructive 
and exact way using about four times the number of C- 
NOTs required by the lower bound (for large enough n). 

Note that the results of this section are derived in the 
setting where the CPTP map is implemented in the quan- 


® For a more rigorous proof one could use a similar argument as 
given in [l3.[l5ll. 

^ The optimized decomposition scheme of Knill also leads to a 
similar asymptotic result if m ^ 5. 
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turn circuit model. However, this is not the only possibil¬ 
ity. For example, alternative methods for the implemen¬ 
tation of quantum channels are described in and [1^ , 
which allow for additional classical randomness. In fu¬ 
ture work we will investigate how to use our approach in 
an alternative model that allows either measurements or 
classical randomness as additional resources, in order to 
further improve the C-NOT counts. 

Note also that, by Naimark’s theorem, any POVM on 
a system A can be implemented using an isometry from 
system A to an enlarged system AB followed by a mea¬ 
surement on system B. Therefore our decomposition 
schemes for isometries can also be used for the imple¬ 
mentation of arbitrary POVMs. 
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Appendix A: Technical details 

In this section we give a rigorous proof that the 
column-by-column decomposition works for arbitrary m 
to n isometries and we give an explicit C-NOT count in 
the case n ^ 8. Since MCG arise in the column-by¬ 
column decomposition, we first optimize the decomposi¬ 
tion of such gates, based on the decomposition scheme 
of 0 • In addition we perform some optimizations for the 
CSD-approach (based on the Appendix of [l^) and for 
state preparation. 


Ox gates, as illustrated below. 



We denote a fc-controlled NOT gate acting on n qubits 
by Ck,n{o'x)- In the case k = 2 with control on |1) (g) |1), 
we call such a gate a Toffoli gate. 

Lemma 6 (Ci 2 ( 1 /) gates Corollary 5.3]) Any 

Ci, 2 {U) gate can be decomposed using two C-NOT gates, 
three special unitary gates A, B and C and a diagonal 
gate of the form E = |0)(0| -I- e''^|l){l|, where d G R. 



Lemma 7 {C 2 , 3 {U) gates (3, Lemma 6.1]) Any 

C 2 , 3 {U) gate can be decomposed as follows 



where V'^ = U. 

Lemma 8 (Toffoli gates [3, Section VI A]) A Tof¬ 
foli gate can be performed with 6 C-NOTs using the fol¬ 
lowing circuit 



where A = R,(-|)Rj^(|), B = By(-f), C = i?,(f) 
and E = |0){0| -k eT |1)(1|. 

Remark 7 (0, Corollary 6.2]) By adjusting A, B, C 
and E, the circuit topology in Lemma 0 can be used to 
generate C 2 , 3 {U) for any unitary U. 

Proof. This circuit equivalence follows from Lemma [5] 
and Lemma 0 together with the following circuit identi¬ 
ties. 


1. Decomposition of MCGs 

In this section we describe how to efficiently decompose 
MCGs Cn-i,n{U), where we focus on the special case of 
Cn-i,n{W) gates, where W G SU{2). The decomposition 
schemes are based on those in Q , except that we use some 
technical tricks to reduce the number of C-NOTs needed. 
Note that the number of C-nots required is the same 
whether we control on one or zero, because we can always 
transform a gate controlled on |0) on a certain control- 
qubit of a MCG into a gate controlled on |1) using two 


r 

A 

-< 

' ^ 

w- 

U 





A 



w- 

It 




We can halve the C-NOT count if we are only interested 
in performing the Toffoli gate up to a diagonal gate. 














































































Lemma 9 (@, Section VI B]) Let A := Ry (f). We 

can decompose a Toffoli gate up to a diagonal gate with 
the following decomposition 
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Proof. To see this, note that if the second control- 
qubit is in the state | 0 ), the least significant qubit is 
unchanged, since AA"'^ = I. If the second control-qubit 
is in the state | 1 ) and the first control-qubit in the state 
| 0 ), the action on the least significant qubit is A^ax A^ , 
which is —10)(0| -I- |1)(1|. If both control-qubits are in 
the state | 1 ), the action on the least significant qubit is 
AaxAffxA'^axA'^ = ax- We choose the diagonal gate A 
such that | 010 ) is mapped to — | 010 ). ■ 

Lemma 10 (Diagonal gates commute with UCGs) 



Proof. By inspection. ■ 

Lemma 11 {Ck,n{<Jx), k ^ \j~\) Let n ^ 5 denote the 
total number of qubits considered and k S {1,..., [^1}, 
then we can implement a Ck,n{<yx) go-te with at most (8k— 
6 ) C-NOTS. 

Note that the case k = 1 is trivial and the case k = 2 
is implied by Lemma |8] (although we know of a tighter 
bound in both cases). 

To illustrate the idea in the remaining cases, consider 
the decomposition leading to the desired C-not count 
for fc = 4, n = 7. Lemma 7.2 of Q shows that 





action part reset part 


: 










1 r 

A , 




1 r 

* t 


A 1 




[7 q 




: 









1 r 

A 1 




y • 



However, we consider instead the alternative decom¬ 
position 



action part reset part 



To see that this is also valid, note that the diagonal 
gates Ai are of the same kind as introduced in Lemma [9] 
and therefore = A|. By LemmalTOlthe two A 2 and Ai 
gates cancel each other out. In addition, the combination 
of all gates between the two Aq gates together correspond 
to a UCG acting only on the least significant (lowest) 
qubit, and hence the two Ag gates cancel out each other 
by Lemma fTOl 

The Toffoli gates that don’t act on the least signifi¬ 
cant qubit, can be decomposed together with the diag¬ 
onal gates using Lemma [9l This leads to the following 
decomposition of the action part of the last circuit 



where A = Ry{j). The marked gates cancel each other 
out, because they commute with the gates between them. 
The reset part can be decomposed analogously. 

Proof of Lemma lllL First we apply Lemma 7.2 of Q 
(a circuit diagram for the case k = 5 and n = 9 can be 
found in i)- By similar arguments as used in the special 
case above, we introduce a corresponding diagonal gate 
for each Toffoli gate apart from the two that act on the 
least significant qubit (i.e., on the target qubit of the 
gate). 

The required C-NOT count for Ck,n{<^x) is thus equal to 
twice that required for the reset part plus the number of 
C-NOTs needed to implement the Toffoli gates that form 
the first and last gate in the action part. By Lemma [ 8 l 
the two Toffoli gates can be decomposed using 12 C- 
NOTs. One reset part uses — 3) -I- 3 C- 

NOTs. This leads to the claimed count. ■ 

Lemma 12 {Ck,n{<^x) @5 Lemma 7.3]) Let n ^ 5 de¬ 
note the total number of qubits considered. A Cn- 2 ,n{<^x) 
gate can he decomposed into two Ck,n{o'x) o,nd two 
Cn-k-i,n{(Tx) gates, where /c S { 2 , 3, ..., n - 3}. 

For example, the decomposition for n = 7 and fc = 4 
is shown in the following circuit diagram. 



Theorem 13 (C„_i_„(t/)) Let n ^ 3 and U be a single¬ 
qubit unitary. We can decompose a Cn-i,n{U) gate using 
at most 16n^ — 60n -|- 42 C-NOTs. 
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TABLE IV: C-NOT counts and numbers of real parameters that can be introduced into a circuit by a specific gate, for various 
controlled gates. 


Gate 

Notation 

C-NOT count (upper bound) 

^ Real parameters 

UCG (up to a diagonal gate) 

AC“_i([/) 

- 1 [1^ 

2 " 

Uniformly controlled rotation 

c^AR^)/c^-iiRy) 

[19, 22] 

2^-1 

Multi controlled unitary gate 

C„_i,„(U) 

16n^ — 60n -|- 42 if n Is 3 (Thm. 1131) 

4 

Multi controlled special unitary gate 

Cn-lAW) 

{W G SU{2)) 

28n — 88 if 71 > 8 is even (Thm. 1141) 

28n — 92 if 71 ^ 8 is odd (Thm. I14|) 

3 

Multi controlled Toffoli gate 


8 fc — 6 if n ^ 5, fc G {3,..., [§]} ILemmafTTI) 

0 


Proof. The idea is contained in the following diagram 
in which V is chosen such that = U (see Lemma 7.5 
of 0). 



Using Lemma [H this gives the relation Nc„_i^^{u) = 
+ 4 + 27Vc„_ 2 For simplicity, we con¬ 

sider the Cn- 2 ,n{U) gate as a Cn- 2 ,n-i{U) gate. This 
will lead to an overcount in our final C-NOT count. Us¬ 
ing Lemma[n]we have = 2(^Cr„/2i-i.n(<rx) + 

n(<rx)) for n ^ 5 and hence, from Lemma [TTl 
^Crv -2 n(o-x) — IBu—40 for 71 ^ 5. Note that Lemma[5]im- 
plies that the same bound also holds for n = 4 (although 
we know of a tighter bound in this case). Thus, we wish 
to solve the recursion = N^c’„_ 2 ,n-i(£/) +327i — 

76. Noting that = 6 (cf. Remark [T]) we obtain 

the stated count. ■ 

Note that this count could be improved. However, it 
turns out that the case W £ 517(2) is particularly useful. 
In this case we make more effort with the optimizations 
leading to the following. 

Theorem 14 (C„_i_„(lU), where W G 517(2)) Let 
71 ^ 8 and W £ 517(2). We can decompose a C„_i^„(lU) 
gate using at most {28n — 88) C-NOTs if n is even and 
(2871 — 92) C-NOTs if n is odd. 

Proof. To aid the proof, we provide illustrations for the 
case n = 8. By Lemma 7.9 of 0 there exist quantum 
gates A,B,C £ 517(2) such that we can decompose the 
C„_i,„(IU) gate as follows. 




By Lemmawe can decompose the Cn- 2 ,n{(^x) gates 
using two Cki,n{cFx) and two Ck 2 ,n{cFx) gates, where we 


set ^2 = r^/2] and fci = ti — ^2 ~ 1- In our example 
fci = 4 and ^2 = 3: 



Since the Cn- 2 ,n{<^x) gate is its own inverse, we can 
use the inverted decomposition scheme to decompose the 
second Cn- 2 ,n{<^x) gate. We can decompose the gates 
Cki.nicTx) and Ck 2 ,n{cFx) using Lemma [TTJ Note that 
this works for all ri ^ 8, since 3 ^ ki,k 2 ^ r'n/21. We 
can lower the C-NOT count with some technical tricks. 
As in the proof of Corollary 7.4 of 0 we can decom¬ 
pose all Toffoli gates not acting on the least significant 
qubit up to diagonal gates. This can be seen by re¬ 
versing the decomposition scheme of Lemma [TT] for the 
second and fourth Ck^^ni^'x) gate and using Lemma [TUI 
Therefore, using the same technique as in Lemma 1111 
but implementing all Toffoli gates up to diagonal gates, 
we can decompose each of the Ck^ ni^^x) gates using 
- 2 ■ 6 + 2 ■ 2 = 8fci - 14 C’-NOTS. 

Now consider the marked part of the last circuit. By 
Lemma [TT] this can be decomposed using 


^ ^ 



^ ^^ ^ 

- 

^ r 


7^ 

r 

N ^ 1 ^ 


r 

■>1 ir 


r 

^ r 


zr- 

r 




J 


7^ 


J 


J' V 

J 




) 


zr^ 







1 



1 1 



1 




























B 






where, to simplify, we have not explicitly illustrated the 
diagonal gates. The two reset parts commute with the 
controlled B gate, since they don’t act on the two least 
significant qubits, and cancel out. Therefore each of the 
marked Ck^A^x) gates uses = 

4^2 + 3 C-NOTS. We decompose the other two Ck 2 ,n{^x) 
gates exactly as in Lemma [TT] Using Lemma [51 for the 
three single controlled gates then leads to the claimed 
C-NOT count. ■ 
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0 

0 


0 


0 


0 



c' k 
^5+1 


0 


0 

j 

CaJ + l 


0 




“s + 1 

CaJ+2 

CaJ+3 

^ \r) = 

“s+1+1 

0 

m = 

CaJ + 1 
'-aJ+2 

^ \r) = 

0 

‘^“"+1+1 

CaJ+4 




CaJ+3 


0 

CaJ+5 


0 




'^“s+1+2 

C2n-s_2 


C'2n-(s + l)_i 


C2n-s_2 


0 

_ C2n-s_i _ 


0 


_ C2n-B_i _ 


. 1 _ 


FIG. 3: Using a quantum gate A to disentangle the (n — s)th qubit into the state fcs = 0 or fcs = 1 respectively. 


2. Overview of C-not counts for controlled gates 

We summarize C-NOT counts for some commonly-used 
uniformly and not uniformly controlled gates in Table HVl 
Note that implementing a uniformly controlled C'“_i(?7) 
gate up to a diagonal gate A means that we implement 
AC^_i([/), for some diagonal gate A. The number of 
real parameters required to specify a particular gate is 
shown in the final column and follows from Lemma [T] and 
the block diagonal form of the uniformly controlled gates 
(see also the argument used to derive the lower bound for 
isometries in Section uni. For example, a C^_i(U) gate 
is described by 2"“^ (2 x 2)-unitaries. By Lemma [T] this 
corresponds to 4 • 2"“^ real parameters. Since a diagonal 
gate A on n qubits is described by 2" real parameters, a 
AC^_i(U) gate is described by 4 ■ 2”“^ — 2” = 2” real 
parameters. 

3. Rigorous proof of the decomposition scheme 
described in Section IIV Gl and exact C-NOT count 


it is clear from the context that, e.g., |Z) S T-Ln-s, we 
shorten the notation and write |Z) instead of | 0 „_s- 
[Note that we use the following convention: If s —1 < 0, 
we mean that the part \ks- 1 ks -2 ■ ■ ■ ko) in equation (lAII) 
does not exist, i.e., for s = 0 the statement of equa¬ 
tion (EH is: lip) = 10- Analogously, means 

that no such part exists in the considered expression. 
Similarly we set {us, ..., Ue} = 0 if rie < Us] 

Lemma 15 Take := G |0j where “e” 

stands for entangled and assume that 

C 2 aJ^,+i =0 if ks = 0 and 6^+1 0. (A2) 

There exists a UCG A := C^_i_s{U) of the form 

A= 10(11 (A3) 

1=0 

such that \tjj') := A Itp) has the form 


We begin this section by introducing some additional 
notation. For to' £ N and k £ {0,1,..., 2"* — 1} we use 

the notation: k = [km>-i,km,'- 2 , ■ ■., fco] := ^>^0 

i.e., {ki} are the binary digits of k. For s £ Nq we define 
Og, £ No by A: = 0^2® -|- b^, such that is maximal. 

For s £ {1, 2,..., n' — 1}, where n' £ Nj >2 and n' ^ m', 
we can also write = [A:„'_i, fc„'_ 2 ,.. •, fcg] and = 
[fcs_i, ks-2, ■. ■, ko]. 

We now consider an elementary step in the decom¬ 
position scheme. Let n £ N^ 2 , w £ N with n ^ m, 
A: £ {1, 2,..., 2” — 1} and s £ {0,1,..., n — 2}. Further¬ 
more suppose ['(/’) is an n-qubit state of the form 


1^) 


A"-®-! \ 

Ci\l) \ ®\ks-iks-2 ■ ■ -ko), 

\ i=< ) 


(Al) 


where ci € C for all I £ {Og, -|- 1,..., 2" ® — I}. Since 


/2"“('’+i>_l \ 

l'0') = I c[|/) I (g) |A;sA:s_i ...A:o), (A4) 

\ '=“^+1 / 

where c[ £ C for all I £ {ag_,_j, ag_,_j -|-I,..., 2 ”“N+i) _i}. 
Additionally, A has the property that 

A |i) = |z) for all i £ {0,1,..., A: — 1}. (AS) 

Proof. The following proof depends on whether kg = 0 
or kg = 1. In the case kg = 0 we has also to distinguish 
between the cases = 0 and 0. The reader 

might find it useful to read the proof first considering 
only the case A:^ = I (and therefore b^_^_i 0). 

Considering blocks of two elements, there exist two pos¬ 
sible forms of |'0®), depending on whether Ajg = 0 or 
kg = 1. If kg = 0, then afp = is even and there¬ 

fore 1^/’®) begins with an even number of zeros (assuming 
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FIG. 4: Decomposition scheme of a quantum gate Gk- The notation surrounded by the square signifies either a control on 
one or on zero. 


7 ^ 0). If fcs = 1, then + 1 is odd and 

|'0®) begins with an odd number of zeros (see Fig. [3]). 
By equation (IA3p the quantum gate A leaves the s lower 
significant qubits invariant and we can write: A\il}) = 

c'1 1 /)^ 0 \ks-iks -2 ■ ■ - ko) for some coefficients 

c'® € C. We define \tjj'^) := *^7 10- We want to 

find a gate A, such that for I' € { 0 , 1 ,..., 2 ”“®“^ — 1 }: 
c^ 2 i'+i = 0 if fcs = 0 , and c' 21 / = 0 if /cs = 1 , i.e., we want 
to disentangle the (n — s)th qubit into the state \ks). 

We now determine the UCG A. To ensure that A fulfils 
equation (IA5I) we set: 

/ for Z G {0,1,..., a^+i} if ^ 0, (A 6 a) 

/ for Z G {0,1,..., - 1} if 6^+1 = 0. (A 6 b) 

If the gate A is not already fully specified by equa¬ 
tion (IMl) . we use Lemma m to determine the gates Ui for 
I G + 1, + 2,..., - 1} if 6^+1 ^ 0 and 

for I G {aj+i, aj+i + 1,..., 2"-i-® - 1} if 6 ^+^ = 0: 

r ^ Q ^ iiks=0, (A7a) 

r ^ ^ ^ if fcs = 1, (A7b) 

where r G M. [Note that if b'^^i = 0 and I = a^^i, the 
gate A acts trivially on |i) for all i G {0,1,..., fc — 1}, 
because of the form of the gate A and since 
for alH G {0,1,..., fc — 1} in the considered case.] 

With this choice of the gate A we conclude: For all 
I G {a^+i-bl,a^+i-b2,...,2”-i-®-l}wehavec'2i+i = 0 
if fcs = 0 and c '21 = 0 if fcs = 1. Because of the initial 

form of IV'®) and the construction of the gate A we con¬ 
clude further that c'^, = 0 for I' G {0,1; • ■ •; ~ !}• 

It remains to consider the two coefficients c'%„k and 

''“s + l 

If fcs = 0 and 6 s+i = 0; then we can zero the coefficient 
with the gate A (see equation (IA7al) '). In the 

case fcs = 0 and 6^+1 0 the coefficient C 2 a_k^^^i is zero 

by assumption and we act trivially on it with the gate A 
by equation (IA 6 a|) . If fcs = 1, then = 0 because 

s + 1 


Ui 


C21 

C21+1 



the corresponding entry in |V’®) is initially zero by equa¬ 
tion (I All) and A acts trivially on it by equation (IA 6 a|) . 

So in all cases we can write 10 j ® 

|fcs), for some c[ G C (see Fig. |3]). Therefore, A \tp) is of 
the desired form dH) and by construction A satisfies 
equation (IA5I) . ■ 


Lemma 16 Let fc G {1, 2,..., 2" — 1} and s G 
{0,1,..., n — 1} be such that fcs = 0 and 7 ^ 0. Let 
IV’) be an n-qubit state of the form equation dsil). Then 
there exist a MCG B := C„_i(C/), whose non trivial part 
is of the form |Ari)(Ari| 0 1/ 0 |Aro)(Aro|) where Ki = 

{kn—l^ kji—2-, ■. ■, fcs-t-i] and Kq — [fcs—1, fcs—2? ■ ■ • ? ^0]? 

such that we can write 


W) 


c)|Z) 0 |fcs-lfcs- 2 ---fco), 

\ / 


(AS) 

where c[ G C for all I G {aj, Us -I- 1,..., 2” ® — 1} and 
c'^ak _|_i =0. In addition, B leaves the first k basis states 
invariant 


B I*) = 10 i G {0,..., fc — 1}. (A9) 

Proof. Since fcs = 0 the condition (IA9I) is satisfied by 
construction of the gate B. We define the gate U with 
Lemma 0] such that 


U 


^2a 


k 

s + 1 





(AlO) 


where r G M. ■ 

Lemma 17 (One column of an isometry) Let k G 

{1, 2,..., 2” — 1}. Let IV') G Bn be an n-qubit state such 
that {i\il^) = 0 for i G {0,1,..., fc — 1}. There exist a 
quantum gate with the following properties: 

Gk\i) = e'^'\i), tG{0,l,...,fc-l}, (All) 
GfclO) = (A12) 

where G M for all i G {0,1,..., fc}. 
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Proof. We claim that we can implement the operator 
Gfc with a circuit of the form as shown in Fig. 01 

[Note that we have interchanged the order of the MCGs 
and the UCGs compared with Section IIV Cl We are al¬ 
lowed to do this, since the gates commute by their con¬ 
struction.] 

The structure of this decomposition is based on the 
idea used for state preparation in [I^. The diagonal 
gates in are present so we can use the 

efficient decomposition of the UCGs up to diagonal gates 
in [l^. Note that we never use the MCG G„_i(t/o), since 
we can absorb it into the UCG G“_]^(t/g). Formally we 
write: 

n—1 n—1 

Gfc = n O. := n (As ® 

s—0 s—0 

To keep the notation simple, we don’t write down 
which of the n qubits are the control/target qubits. The 
target qubit of the controlled gates with lower index s is 
the (n — s)th qubit. We consider all controlled gates as 
n qubit gates. If there are free qubits, i.e., qubits that 
are neither controlled nor acted on, they are the least 
significant ones. 

We use Lemma [15] recursively to disentangle one qubit 
after another starting from the state ]'!/')■ More for¬ 
mally: We define the state IV's) := OsA^o 
s G {l,2,...,n} and we set j^o) := IV’)- To determine 
the gate G]J_i_g(C/,V) for s G {0,1,..., n — 2} we ap¬ 
ply Lemma [15] on the state jV’s) := G„_i(C/s) jV's)- If 
ks = 0 and ^ 0, IV’s) does not satisfies the condi¬ 
tion (IA2I) for Lemma in general. In this case we can 
determine the MCG G„_i(f/s) by Lemma [TBl such that 
lip's) satisfies the condition (IA2I) . In all other cases we 
set Cn-iiUs) = I- Note, that the diagonal gate 
leaves the form of the state C!^_i_s{Ug) \ip's) invariant 
up to phase shifts. 

In the case s = n — 1 we have and so either the 

most significant qubit is initially disentangled (fc„_i = 
1) or can be disentangled with the MCG G„_i(t/„_i), 
determined by Lemma 1161 (fc^-i = 0). Therefore we set 
<^o (t^n-i) = I and A„_i = I. 

By construction, the operators Os leave the states 
invariant (up to phase shifts caused by 
the diagonal gates). ■ 

Lemma 18 (C-not count for one column) Let k G 

{1,2,...,2"^ — 1}. We can decompose a quantum gate 
Gk, which is of the form as describe in Lemma [77] using 
at most ((2" — n — 1) -I- Q^{n)Nc^_,j^(u)) C-NOTs, where 
Q’^pn) := |{s : fcs = 0 A ^ 0, s G {0,1,..., n - 
1}}| and Nc^_-i(u) denotes the number o/C-NOTs used 
to decompose an G„_i(C/) gate. 

Proof. To decompose the quantum gate Gk we use 
the decomposition scheme described in the proof of 
Lemma [17] The number of C-nots used to decompose 
the UCGs (together with the diagonal gates) give a count 


of - 1) = 2” - n - 1 C-NOTS 0. By the 

construction of the proof of Lemma [T7I we conclude, that 
the quantity of MCGs used for the decomposition of Gk 
is at most Q^{n). We add the number of C-NOTs used 
to decompose Q^{n) MGGs to the C-not count used to 
decompose the UCGs and get the claimed count. ■ 

Corollary 19 The number of MCGs Q{m, n) used to de¬ 
compose all operators in {Gi}ig{i 2,,,,,2™-i} using the de¬ 
composition scheme as in the proof of Lemma [77] is given 
by: 

Q{m, n) = 2™ ^ — 1^ — n -|- to -|- 1. (A13) 

Proof. We define the indicator function I{k,s) by: 

1 if fcs = 0 A bg^i ^ 0, (A14a) 

0 otherwise. (A14b) 

In other words I{k,s) = <5fc^,o(l — — 

Q, since b^_^_i = 0 implies ks = 0. Now we can write 
Q^in) = Z)s=o 7(fc, s). By Lemma [H] 

2"*-l n-l 

Q{m,n)= ^ Q'^{n) = '^Qs{m), (A15) 

k—1 s—0 

where Qs{m) := Y^k=i ^7(fc, s) denotes the number of 
MGGs acting on the (n — s)th qubit used to decompose 
all the gates in {Gi}ig{i_2,...,2 ™-i}- IfTO^s^n — Iwe 
have: 

2™-l 

Qs{ni) = Y, Hk, s) = 2™ - 1, (A16) 

fc=i 

since I{k,s) = 1 for the whole index range. If 0 ^ s ^ 
TO — 1 we include fc = 0 into the index range to simplify 
the combinatorial idea behind the following calculation: 

2™-l 

Qs{m) = Y mod 2 »+Ao = 2 '"-' - 2 ™-«-b 

(A17) 

Here we have used that Q = i5fcmod2®+Ao by definition 

of Plugging everything into equation (IA15I) . we get 
the claimed count. ■ 

Lemma 20 (Column-by-column decomposition) 

Let V be an m to n isometry, described by a 2^ x 2"^ 
matrix, and /2"x2"» denote the first 2™ columns of the 
2" X 2" identity matrix. There exist quantum gates 
Gi, G2 ,..., G2 m-i of the same form as in Lemma [73 
as well as a quantum gate Gq, which satisfies equa¬ 
tion HA 1 S\} for an arbitrary n-qubit state \ip), and a 
diagonal gate A acting on m qubits, such that 

gIg\ ... gU_, ® At) /2n X2™ = V. (A18) 
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Proof. Assume that we know a decomposition of a quan¬ 
tum gate G into one-qubit and C-NOT gates. We can 
inverse its order and take the conjugate transpose of the 
one-qubit gates to get a decomposition of G\ since a C- 
NOT gate is inverse to itself. In particular, and G can 
be implemented using the same number of C-NOTs. This 
allows us to replace equation (IA18I) by 

/2"X2™ = (/®("-™)0A)G2™_iG2™_2...Goy. (A19) 

By definition of the gate Gq, we can choose it such that 
GqV |0)^ = e“^o |0)„, where G K. Since the columns 
of an isometry are orthonormal and Gq is unitary, the 
columns of GqV are also orthonormal (for example, 
|„(0| Go^ |0)^ I = 1 implies that n(0|Goy|l)^ = 0). 
We can therefore choose Gi, such that GiGoI^ll)^ = 
|l)ni where G K . By definition of Gi, 
GiGoI^ |0)^ = e“^o |0)^, where (fl G K. If we con¬ 
tinue this procedure, we get G 2 m_iG 2™-2 ■ ■ • GqI^ \i)^ = 

^ K)n ^ {0,1, ■. ■, 2"* — I}, where pi G 

M. We clear up the phases with a diagonal gate A 
acting on the m lower significant qubits, such that 
® A)G 2 ™-iG 2 »- 2 ...GoI^|i)^ = K)„ for i G 
{0,1,..., 2™ — 1}, which is equivalent to equation (IA19I) . 


Theorem 21 (C-NOT count for an isometry) Let m 

and n be natural numbers with n ^ 8 and V be an iso¬ 
metry from m qubits to n qubits. There exists a decom¬ 
position of V in terms of single-qubit gates and C-NOTs 
such that the number of C-not gates required satisfies 

Wso(m,n) ^ Nsp{n) -\- Nc{m,n) -\- NA{m), (A20) 

where Nsp{n) denotes the number o/C-NOTs required for 
state preparation on n qubits starting from the state lO)^^, 
NA(jn) sC 2™ — 2 denotes the number of C-NOTs req uired 
to decompose a diagonal gate acting on m qubits \1 £1 / and 
NG{rn, n) is the number o/C-NOTs used to decompose the 
gates in {Gi}ig{i^ 2 ,..., 2 ™- 1 }- 

Proof. We decompose V as described in Lemma EUl 
and {Gi}ig{i 2 ,..., 2 ">-i} as in the proof of Lemma flTl By 
Lemma [TS] we have 

2'"-! 

NG{m,n)= ^ -n-l-\-Q’'{n)Nc„_pu) 

k^l 

= (2™ - 1) (2" - n - I) + Q(m, n)iVc„_,(c/) 

where Q{m,n) = 2"*(n—is the number 
of MCGs used, as given by Corollary [TOl and Ng„_i{u) 
denotes the number of C-nots needed to decompose a 
MCG G„_i([/), given by Theorem [TH Note that we re¬ 
quire U G SU (2) to use Theorem [TH This causes no 
problems in our construction, since Lemma [1^] holds for 
U G SU{2). The gate GJ can be decomposed using a de¬ 
composition scheme for state preparation, which finishes 
the proof. ■ 


Corollary 22 (Explicit count for an isometry) 

The number of C-NOTs required to decompose an m to 
n ^ 8 isometry V satisfies 

Aiso(m, n) < \2^+^ _ I 2 " - 2 • 2^ (A21) 

-h 2"* (28n^ -h m(44 - 14n) - 117n -F 88) 

- 28n'^ -f TO (28n - 88) -h 117n - 87]. 

Proof. Theorem [TTl implies that Ng„_.^(u) ^ 28n — 88 
for all n (for simplicity we over-count in the case that n 
is odd). The asymptotic best-known C-NOT counts for 
state preparation (see Table [T| give us the upper bound 
Asp(u) ^ §§2" — 2- 2^ -1-2. The number of C-NOTs used 
to decompose a diagonal gate A acting on to qubits is at 
most Aa(to) = 2™ — 2 [l^. Using the inequality (IA20I) 
this leads to the claimed count. ■ 


4. Optimization of the decomposition of an 
isometry using the CSD 


Theorem 23 (Optimized CSD approach) Let m 

and n be natural numbers with 2 ^ to ^ n and V be 
an isometry from to qubits to n qubits. There exists a 
decomposition of V in terms of single-qubit gates and 
C-NOTs such that the number of C-not gates required 
satisfies 

23 1 

iVi,o(TO,n) < _(4™ + 2.4”)-2™-i-2" + -(TO-n+4). 

(A22) 

Note that we recover the optimized C-NOT count for 
general quantum gates [l^ setting n = to in the inequal¬ 
ity (|A22p . 

Proof. We optimize the C-NOT count of Section IIV Dl 
using the two ideas described in the Appendix of [l2j ). 
There it is shown how one can combine the decompo¬ 
sition of the Gf{Ry) gates with neighbouring i-qubit- 
Gi{U) gates to save one C-not gate over what would be 
required if the Gf{Ry) gates were decomposed on their 
own. The essential idea is to use the circuit identity 


n — 2 



The same idea also works for the CSD adapted to 
isometries, allowing us to save 1 C-NOT per uniformly 
controlled Ry gate. 

To count the number of uniformly controlled Ry gates 
Qpy (wi n) used for an to to n isometry using the de¬ 
composition scheme of Section llVDI we use the following 
recursion relation; 

2.4*-2 _ 2 

QRy{m,i + 1) = QRy{m,i) H-^-hi if to ^ z < n 

(A23) 

(A24) 


QRyim,m) = 


- I 


3 
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where the last relation comes from Appendix A of [l^ . 
Solving these gives 

Qn, (m, n) = ^ (2^-+! + 4™) + i (n - m - 1). (A25) 

The CSD decomposition is used until the only generic 
unitaries that remain are on two qubits. In Appendix B 
of [12 it is shown how to save one C-NOT gate for each 
of the remaining two-qubit gates apart from one. Again 
this idea also works using the CSD adapted to isometries. 
The number of two-qubit gates Qu 2 n) arising in the 
decomposition scheme described in Section ITV PI satisfies 
the following recursion relation: 

Qu 2 (w, i -I- 1) = Qu 2 (w, i) + 2 ■ 4®“^ if m < i < n, 

(A26) 

Qu 2 (to, rn) = 4"*“^, (A27) 

where the last of these relations is taken from Appendix B 
of [12] • Solving these gives 

Qc/.(TO,n) = ^(22-+i+4-). (A28) 

The optimized C-NOT count is thus given by 

Niso{m, n) = Niso{m, n) - Qr^ (m, n) - Qu 2 (to, n) -I- 1, 

(A29) 

where Nisoi'm, n) is bounded by the inequality (IT^ . This 
leads to the claimed count. ■ 


in the decomposition of a general unitary can be decom¬ 
posed using two C-NOT gates. For the last one we can 
also extract a diagonal gate and merge it with the state 
preparation, since the diagonal gate commutes through 
the control qubits of the C-not gates that precede Ui. 

In other words, for n even, we have 

IVsp(n)«JVspg)+ = +2iV,„(|, =)-l 

Nsp{n + 1 ) ^ NsP ( 2 ) + 2 + ( 2 ’ 2 ) 

+-^iso ,-2 + 1 ) “ 1 > (A30) 

where for the purpose of evaluating TViso in these counts, 
we use the inequality (IA22I) . Starting from 7Vsp(2) = 1 
and Nsp{3) = 3 [12, this allows us to iteratively compute 
Nsp(n) for increasing n. For illustration purposes, the 
circuit for state preparation on 4 qubits is shown in the 
following circuit diagram. 



Note that the depth of the circuit is, to leading order, 
the number of steps required to perform C/ 2 , since Ui and 
U 2 can be done in parallel and dominate the gate count. 


5. Optimized state preparation 


Appendix B: Isometries on a small number of qubits 


For state preparation on two and three qubits there 
exist ad hoc methods using one and three C-NOT gates 
respectively [12 • For state preparation on n ^ 4 qubits 
we use the decomposition scheme described in |l3j . In 
the case that n is even, this uses the following iterative 
circuit: 


1. Isometries from one to two qubits 

We present an ad hoc decomposition for a 1 to 2 iso¬ 
metry V reaching the theoretical lower bound of two C- 
NOT gates. Our result is based on the following decom¬ 
position of an arbitrary two-qubit operator U described 

in [12, [a [n. 



where we have divided the qubits into two groups of n/2. 
In other words, state preparation on n qubits is equiva¬ 
lent to state preparation on n/2 qubits, n/2 C-NOTs, and 
then two n/2 qubits unitary operations. If n is odd, the 
unitary C/i is replaced by an [n/2j-qubit unitary and U 2 
by an [n/2j to [n/2J -|- 1 isometry. 

If n is odd we can implement U 2 using the CSD ap¬ 
proach. Furthermore, we can use a similar technical trick 
as described in Appendix B of [H to save one C-not 
gate when implementing Ui: as noted in Appendix B 
of [12 all apart from one of the two-qubit gates arising 



We represent F by a unitary matrix V 2 such that V = 
^ 2 ^ 22 x 21 - Since we are only interested in the first two 
columns of V 2 , we can replace the diagonal gate A of 
the last circuit by a single-qubit diagonal gate acting on 
the least significant qubit. Absorbing this gate into the 
neighbouring (arbitrary) single-qubit gate we conclude 
the following circuit equivalence. 
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2. Isometries leading to three qubit states 

In this section we explain the steps needed to decom¬ 
pose isometries from m to 3 qubits for m = 1 and m = 2. 
Note that for m = 0 one can use the decomposition 
scheme for state preparation given in , and for 

m = 3 the decomposition scheme of (T^ . 


implement AC rather than C for each UCG C) and cor¬ 
rect for these at the end using a diagonal gate applied 
to the least significant qubit. Doing so we can save some 
C-NOTs, because for small n, we know how to implement 
AC“_i(t/) more efficiently than C„_i(t/). For example, 
we need 8 C-NOT gates to implement a C 2 , 3 (t/) gate (cf. 
Lemma [5] and [T]) and only 3 C-not gates to implement 
a AC 2 (C) gate (cf. Table HvT) . 


a. Isometries from one to three qubits 


We use the column-by-column approach described in 
Section IIV Cl to decompose an isometry V from one to 
three qubits. As in Section lIVi we represent the 8x2 
matrix corresponding to D by an 8 x 8 unitary matrix 
G'l' by writing V = G'^Isx 2 - The unitary Cj (defined in 
Section HV Cl) corresponds to state preparation on three 
qubits (Gq |0)®^ = |0) =: IV'o)) can therefore be 

implemented with the techniques described in [I^ 113 ■ 
We now consider constructing a circuit for the unitary 
Gi. We define := GoD|l) and note that its first 
entry is zero. One can use Lemma 0] to choose the gates 
depicted in the circuit diagram below such that they have 
the following action on (as previously “=(=’ represents 
an arbitrary complex entry): 



We implement each UCG together with its subsequent 
diagonal gate as described in . Together with the cir¬ 
cuit for the unitary Gq , this leads to the following circuit 
for the isometry V 




- 4 

►-( 

- G 

1 - < 

A 

t - 

— 1 

»-( 

7 ' 
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where we have not depicted the single-qubit gates for 
simplicity. 
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b. Isometries from two to three qubits 


We use the CSD-approach described in Section ITVDI to 
decompose an isometry, U, from two to three qubits. As 
in Section ITVl we represent the 8x4 matrix corresponding 
to U by an 8 X 8 unitary matrix G^, W writing V = 
G^Isxa- Then we apply Theorem 10 of [l3| to G^, which 
gives us 


| 0 ) 

2 




Note that all the gates in the circuit above act triv¬ 
ially on the state |0)®^. Therefore this represents a valid 
circuit for the unitary Gi. 

Remark 8 The notation in the eircuit diagram above is 
as introduced in the general case in Section \IV Cl The 
difference between the circuit above and the circuit we 
would get by the techniques of Section \IV Q is that we 
switch the order of the UCG and the MCG (note that they 
commute by construction) and leave away some controls 
of the MCGs. Indeed, similar simplifications are possible 
for the most MCG, which arise in the column-by-column 
decomposition of arbitrary isometries from m to n qubits. 
We have not taken this into account in the general C-NOT 
count, since it does not affect its leading order. 


where each of the symbols A and R is a placeholder for 
two two-qubit unitaries denoted by {Ag, Ai} or {Bq, Bi} 
respectively. Since we can assume that the first qubit is 
initially in the state |0), we always implement Ag on the 
last two qubits at the start of the circuit (on the right 
hand side) above. Therefore we can simplify the above 
circuit. 


We apply Theorem 8 of [T^ to the uniformly controlled 
Ry gate. Together with Appendix A of [H this leads to 
the following circuit for the isometry V 


|0) - 
2 V 



Since MCGs are a special case of UCGs, we can imple¬ 
ment the MCGs using UGGs instead. Furthermore, we 
can implement all the UCGs up to diagonal gates (i.e., 
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FIG. 5: Implementing the second column of an isometry V from one to four qubits with optimized controlling of the MCGs. 
Note that all gates act trivially on |0000). The symbol denotes an arbitrary complex number. 


where we can absorb the Ry{^) and Ry{—^) gates into 
the neighbouring uniformly controlled Ry gates. We ap¬ 
ply Theorem 12 of [l2| to the last uniformly controlled 
gate in the circuit above, which gives us two two-qubit 
unitaries U and W and the following circuit for the iso¬ 
metry V. 



Decomposing the uniformly controlled rotations as de¬ 
scribed in 113 a nd using the techniques described in Ap¬ 
pendix B of leads to the following circuit for V 


| 0 ) 
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fn ri 

^, 
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r) r1 
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a. Isometries from one to four qubits 

As in Section IIVI we represent the 16 x 2 matrix 
corresponding to D by an 16 x 16 unitary matrix 
by writing V = G'^/i 6 x 2 - The unitary gJ (defined in 
Section IIV Cl) corresponds to state preparation on four 
qubits (GJ |0)®^ = D |0) =: |'!/’o)) and can therefore 
be implemented with the techniques described in Ap¬ 
pendix [XS] with 8 C-NOTs. We construct the unitary Gi 
in a similar fashion as in the case of a one to three iso¬ 
metry (cf. Appendix IB 2 a|) using the column-by-column 
approach described in Section HV Cl This leads to a cir¬ 
cuit for the unitary Gi given in Fig. [51 We implement all 
MCG of the circuit for Gi with UCG up to a diagonal 
gates by the techniques described in [l^ and correct for 
this at the end of the circuit with an diagonal gate acting 
on the least significant qubit (cf. Section fB 2 all . There¬ 
fore we use 22 G-NOTs to implement an isometry from 1 
to 4 qubits. 


b. Isometries from two to four qubits 


where the single-qubit gates are not depicted for simplic¬ 
ity. 


3. Isometries leading to four qubit states 


As in Section lYl we represent the 16 x 4 matrix cor¬ 
responding to V by an 16 x 16 unitary matrix G^ by 
writing V = G^/i 6 x 4 - We can construct the unitaries Go 
and Gi as described in Appendix IB 3 al Similary we find 
the following circuit for the unitary G 2 


In this section we explain the steps needed to decom¬ 
pose isometries from m to 4 qubits for m = 1 and m = 2. 
Note that for m = 0 one can use the decomposition 
scheme for state preparation described in Appendix lA 51 
and for m = 4 the decomposition scheme of [l^. The 
case m = 3 can be done with the GSD-approach requir¬ 
ing 73 G-NOTs (cf. equation (IA22I) . and Appendix IB 2 bl 
for an example using the GSD-approach). 



and the following circuit for the unitary G 3 . 
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Note that two controls are required for the MCG for 


the unitary Ga, such that G 3 acts trivially on the states 
10000 ), 10001 ) and | 0010 ). 

We implement all MCG with UCG up to a diagonal 
gates by the techniques described in [l^ and correct for 
this at the end of the circuit with a diagonal gate act¬ 
ing on the two least significant qubits. Since a diagonal 
gate on two qubits requires 2 C-NOT gates [I^, we con¬ 
clude that we need 54 C-nots to implement a two to four 
isometry. 
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